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/S7^ An improved technique for compressing a color 

or orav scale pixel map representing a document using 

an MRC format ncludes a method of segmenting an 

originalpixelmapintotwop^^^^ 

pressing the data or each plane in an efficient manner. 

The rnage is segmented by separating the n^ge into 



two portions at the edges. One plane contains irnage 
data for the dark sides of the edges, whie image data 
for the bright sides of the edges and the smooth portions 
of the image are placed on the other plane. This results 
in improved image compression ratios and enhanced 
image quality. 
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Description 

[0001] This invention relates generally to Image 
processing and, nDore particularty, to techniques lor seg- 
menting, classifying and/or compressing the digital rep- s 
resentation ot a document. 

[0002] Documents scanned at high resolutions re- 
quire very large amounts of storage space, instead of 
being stored as is. the data is typically subjected to some 
form of data compression in order to reduce its volume, 
and thereby avoid the high costs associated with storing 
it. •Lossless" connpression methods such as Lempel-2iv 
Welch (LZW) do not perform particularly well on 
scanned pixel maps. While 'lossy* methods such as 
JPEG work fairly well on continuous-tone pixel maps. 
they do not work particularly well on the parts of the page 
that contain text. To optimize image data compression, 
techniques, which can recognize the type of data being 
compressed, are needed. 

[0003] Known compression techniques are described ^ 
in US-A-577e092. US-A-5251271. US-A-6060980. US- 
A-57e4175. US-A-530331 3 and US-A-5432870. 
[0004] In one embodiment, the present invention dis- 
closes a method of segmenting a pixel map represen- 
tation of a document which Includes the steps of: acquir- 
ing a block of the digital image data, wherein the digital 
image data is composed of light intensity signals in dis- 
crete locations; designating a classificatk>n for the block 
and providing an indication about a context of the block; 
segmenting the light intensity signals in the bkx:k into SO 
an upper subset and a lower subset based upon the des- 
ignated classification; generating a selector set which 
tracks the light intensity segmentation: and separately 
compressing the digital IrnaQe data contained in the up- 
per and k>wer subsets, ss 
[0005] In another embodiment the present inventk>n 
discloses a method of classifying a block of digital image 
data into one of a plurality of image data types, wherein 
the block ot data is composed of light intensity signals 
in discrete kx:ations, whch includes: dividing the bkxk 
into a bright region and a dark region; divkding a low pass 
filtered version of the block into a bright region and a 
dark region; calculating average light intensity values for 
each of the bright region, the dark region, the filtered 
bright region and the filtered dark region; and comparing ^ 
a difference between the bright regk^n and the dark re- 
gion average light intensity values to a filtered difference 
between the bright region and the dark region average 
filtered light intensity values; if the average light intensity 
difference and the average filtered light intensity differ- so 
ence arc approximately equal finding a range of values 
in which the difference value falls, and classifying the 
block based upon the value range; and if the average 
light intensity difference and the average filtered light in- 
tensity difference are not approximately equal finding a ss 
range of values in which the filtered difference value falls 
and classifying the block based upon the filtered value 
range. 



[0006] Some examples of methods according to the 
present invention will now be described with reference 
to the accompanying drawings , in which:- 

Figure 1 illustrates a composite image and includes 
an example of how such an image may be decom- 
posed into three MRC image planes- an upper 
plane, a lower plane, and a selector plane; 
Figure 2 contains a detailed view of a pixel nnap and 
the manner in which pixels are grouped to form 
bkx:ks; 

Figure 3 contains a flow chart which illustrates gen- 
erally, the steps perlormied to practice the invention; 
Figure 4 contains a detailed illustration of the man- 
ner in which blocks may be classified according to 
the present invention; 

Figure 5 contains a detailed iilustration of the man- 
ner in whch blocks may be segmented based upon 
their classification according to the present inven- 
ifon; 

Figure 6 contains the details of one embodiment of 
the manner in which block variatkm can be meas- 
ured as required by the embodiment of the invention 
shown in Figure 4; 

Figure 7 contains the details of an embodiment of 
the invention describing classificaVion of bk>cks 
based upon the block variatnn measurement pro- 
. vkied in Figure 6; 

Figure B contains the details of an embodiment of 
the inventon for which context rhay be updated 
based upon the block classlTication provided in Fig- 
ure 7; and. 

Figure 9 contains the details of another embodi- 
ment of the invention tor updating context based up- 
on block classification as provUed in Figure 7. 

[0007] The present invention is directed to a method 
and apparatus for separately processing the various 
types of data contained in a composite image. While the 
invention will described in a Mixed Fiaster Content 
(MRC) technique, it may be adapted for use with other 
methods and apparatus* and is not therefore, limited to 
a MRC formiat. The technique described herein is suit- 
able tor use in various devices required for storing or 
transmitting documents such as facsimile devices, Im- 
age storage devices and the like, and processing ot both 
color and grayscale black and white images are possi- 
ble. 

[0008] A pixel map is one in which each discrete lo- 
cation on the page contains a picture element or "pixel' 
that emits a light signal with a value that indicates the 
cotor or, in the case of gray scale documents, how light 
or dark the image is at that location. As those skilled in 
the art will appreciate, most pixel maps have values that 
are taken from a set of discrete, non-negative integers. 
[0009] For example, in a pixel map for a color docu- 
ment, individual separations are often represented as 
digital values, often in the range 0 to 2SS. where 0 rep- 
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. colorant (i.e. when CMYK separations are 
'^"T^^f^e^lsl^BL in the range when luminance- 
used)> or the 'o*'«f ^« ^^^^ 255 .epiesents 

c^'^^^l^^^'^^M^^nX or the highest value in 
r Tnrra^ayiSlepbcelmapthistypicaUytrans- « 

tTpi el^aSe^ wh range ..om 0. .or blacl. to 
Ss lor the Whitest tone possible. The pixel rnaps erf 
255. lor in , p,eierrcd embodiment o1 the 

X ,eflS^'o« o1 physical media using a digital scan- 
rCe-bitmap^'used,omeanabir.nrp.x^^ 

*,^K n-»els can take one o1 two values. 1 or 0. 
rootoi Tum-."n<J!r to the drawings lor a rr^re de- 
^^Lso Dtiw^ o1 the MBC lomial. pixel map 10 rep- « 
:':fnt% a cotor or gray-scale document is pr^lejab^ 
tesemmg » ,^^^3, ^ indicated 

decomposed mtoamreep P^ ^^^^^^^^^ 

"^""h in blks ^B^S illustrated m Figure 2). to al- 
?r.o: heue'r^-^ l-essmg e«iciency. The doc- « 

^nLmat is typically comprised ol an upper plane 12. 
TllTSfne s selector plane 16. Upper plane 

',2:^ TZ ptene 14 contain pixels that describe the 
1^ ana .^^^ utiherein oixels in each block IB 

original Image p,e^e«ned crHerla. ^ 

r^vcbocnsepa-ted^s^^P^^^^^^ 

be pbl^ on one plane, while .hose with 
m,eshoW may be place ^^^^^^ 

. TJlth^Sane Selector plane 16 keeps track 

^vTpie ^ oTliS^^^ " 
To an e^^C spot on either upper plane 12 or tower plane 

/ill! The upper and lower planes are stored at the 

' IL. d^Dth and number ol cokxs as the original pixel 
sameb.tdepthandnun« ^^^^^ ^^^^ ^ 

"TL ^k fe crS and stored as a bitmap. It is impor- 
fan? o recog^efh^ ,^i.e the .em,s '"PP-' 
Lr- aVe used to describe the planes on which data re- 

it is^ot intendedto limH the invention to any pa. ^ 
licular arrangement or conliguration. 
Sl2l AttSr processing, an three planes are «>m- 

erably be J '""^f J^'^Jl^, be one ol the approved 

r ^""meTresent invention digital image data is 
LSbly processed using a MRC technique such as 
SLtTbeJ above. Pixel map 10 represents a scanned 



image composed o« light intensity signals dispersed 
throughout the separatkDn at discrete locations. Apairv 
a light signal is emitted from each ol these disciete lo- 
«;£s! relerred to as 'picture eler-ents.- 'pbcete^or 
•pels • at an intensity level which indicates the magni- 
tude ol the light being reflected liom the ofiBinal image 
at the corresponding location in that separation. 
10014] In typical MRC lashion. pixel nr«p 10 musl be 
Urtftioned into two planes 1 2 and U.J^igure 
Tschematic diagram, which outlineslhe wetallproce^ 
used to segment pixel map 10 Wo an upper plane 12 
and a lower plane 14 according to the present mvention. 
Blockl8feacqulredasindiC8tedlnstep2lO:andisctes- 

sified as indicated in step 220. In the Pf«^"f*,«"^ 
iment ol the invention. bkxkJO will """^^XJ^JISe 
as either UNIFORM. SMOOTH. WEAK_EDGE or 
?DGE a^ »s context - either TEXT or PICTURE - will 
be provwed. The block will then be reclassffied ^ either 
SMOOTH or EDGE, depending upon the Initial classiH- 
caiion and the context. Next, pixels in block 18 are seg- 
mented - placed on efther upper plane 1 2 or ^wer ptene 
14 according to criteria that is most appropnate lor the 
o^nner in which the bkxk has been olessifod as inA- 
cated in step 230. Thfe process is repealed lor each 
bk>ck 18 in orignal pixel map 10 until the ent^e pwel 
map 10 has been processed: Upper plane 12. lower 
ptene 1 4 and selector plane 16 are then 6ef»«»e>y <^ 
pressed, using a technique that le most surtabte lor tt« 
type of data contained on ea«Sh. as indicated n step 240. 
I001S1 Turing now to Figure 4. generally speakriB. 
classiticattonof blocks 18 Mo one olthe lour categonee 
in step 220 as described above is prelerably completed 
m three steps. First, the variatipn ol pixel values wrtjm 
L bkxk is determined as mdfcated m step 310. Bkxk 
variation is best delennined by using st?ti6licalmea8- 
ures. which will be described m detail below wilh reler- 
encetoFigure 6. Blocks with large variations throughout 
are most likely to actually lie atong edges crt the image, 
while those contain«g little variattons probably lie in uni- 
lorm or at least smooth areas. Measunng the variations 
wfithin the bkx:k allows an initial classification to be as- 
signed to it as indicated in step 320. Next, image date 
wUhineachblockieis reviewed in detail toaltow context 

intormation <i.e. whether the region Is m the text or pc 
lure region ol the image) to be updated and any neces- 
sarv bkx:k re-classHlcations to be perlonned as shovwt 
in step 330. The UNIFORM blocks are reclassltied as 
SMOOTH, and the WEAK EDGE blocks are ^P9^^ 
to EDGE in a TEXT context or reclassHied as SMOOTH 
in a PICTURE context. A smoothed version 20 ol the 
image is also provided by applying a low pass litter to 
the pixel map 10. Smoothed image 20 is us«l in con- 
junction with original image data to otter additional in- 
tormation during classification, and also provides un- 
screened data lor haHtone regions. 
100161 Figure 5 contains details ol the manner in 
which block 18 is segmented into two planes, as provid- 
ed in step 230 of Figure 3. The measurement begins by 



3 



EP 1 006 716 A2 



first determining at step 410 whether the block being 
processed has initially been classified as an EDGE in 
step 220 II so, the values Vp of each pixel in the block 
are first compared to a brightness thieshold value t,. 
wherein pixels that have values equal to or above t. are 
viewed as -brighf pixels, while those with values below 
t are -dark' pixels. Segmenting EDGE bkDcks simply 
includes placing dark pixels on upper plane 12 as indi- 
cated in step 440. and placing bright pixels on lower 
plane 14 as indicated in step 450. If it is determined at 
step 410 that block 18 is not an EDGE, all pixels in the 
block are processed together, rather than on a pixel by 
pixel basis. Segmenting of SMOOTH (non-EDGE) pix- 
els occurs as follows: if block 1 8 is in the midst of a short 
run of blocks that have been classified as SMOOTH, 
and further, all blocks in this short run are dark (Vp<t) • 
all data in the block is placed on upper plane 12. II the 
entire block 1 8 is substantially smooth (i.e. in a long run) 
or is bright (in a short run of bright pUels). all data In 
block 18 is placed on lower plane 14. 
10017] Turning now to Figure 6. the details of one em- 
bodiment of the invention wherein initial bkxk classifi- 
cation via bkjck varietkxi measurement may be accom- 
plished as required by step 310 (Figure 4) are now de- 
scribed. A threshold, t.. which allows the block to be di- 
vided into two portions is first calculated as indicated in 
step 510 In the preferred embodiment of the invention, 
this threshold is obtained by performing a histogram 
analysis on the data in the block, but many standard 
methods can be used to perfomri this analysis. For ex- 
ample the value that nreximizes between distances of 
the criteria being used lor separation or provides lor 
maximum separation between the two portions of the 
block can be selected. Those skilled in the art will rec- 
ognize that other methods of choosing the best thresh- 
old are available and the Invention is not limited to this 
embodiment. Block 18 Is then threshoWed into these 
two parts by comparing the light intensity value of each 
pixel to the selected threshokJ t^. as indicated in step 
520 As before. H the pixel value Vp is less than the 
threshold, the pixel is referred to as dark. If Vp is greater 
than or equal to V the pbcel is bright. , 
100181 As stated earBer. a snrK>oth version 20 of the 
image is obtained by applying a low pass filter to the 
original image data. Average values for bright and dark 
pixels are then obtained lor both the original and 
smoothed sets of image data. Looking first at the bnght 
pixels one value cateulaled will be Vbp,xel. average 
value for all of the bright pixels in original pixel map 10 
(V 3 1 ) which are located in the area covered by block 
1 as'indfcated in step 540. Another va lue. VbsmootH' 
the average value for all of the bright pixels in smoothed 
version 20 of the image whkrfi are located in the area 
covered by block 18 will also be obtained as «hown in 
step 560 Dark values are calculated similarly. That is. 
vnpivFL the average value for all of the dark pixels in 
original pixel map 10 (Vp - W which are located in the 
area covered by bkjck 18 will be obtained as shown in 



step 550, and Vosr^ooTH. average value for all of the 
dark pixels in the smoothed version 20 of the image 
which are kx:ated in the area covered by bkx:k 18 will 
be obtained as in step 570. Once these average values 

5 are obtained, the distances d and d, between brighter 
and darker averages lor pixel map 1 0 and smoothed im- 
age 20 respectively are calculated as indicated in step 
580. That is d= Vbpixq.- Vdpixel* and d^ = Vbsmcxdth " 
Vdsmooth- S*"^® typically almost equal to 1 lor 

10 contone images, the ratio of d/d. may be used to detect 
halftones. 

[0019] Figure 7 contains a detailed illustratkxi of step 
320, of Figure 4, the preferred embodiment of a process 
for initially classifying bkxks 18. As shown, a relative 
IS comparison between d and is obtained as M'icated 
in step 6 1 0 in order to determine whether the block con- 
tains contone (d »• d^) or halftone data. Block 18 will in- 
nially be classified as one of four types: UNIFORM, 
SMOOTH, WEAK EDGE or EDGE acoording to the 
20 magnitude of the distance d or d,. Distance d is used to 
classify contone bkx:ks. while distance d^ is used for 
halftones. For contone data d. the value from pixel map 
10, Is compared to value Xq shown in step 620. 
10020] W d IS very low (i.e. d< x©). bU pbcel values In 
2S the block are substantially the same and the block Is 
classHied as UNIFORM at step 640. II there are sonne- 
what small differences in pixel values in the bk>ck such 
that Xo<d<x, as shown in step 622, the block is classified 
as SMOOTH, at step 650. II there are fairly large differ- 
so ences in pixel values in the block and x,<d<X2 at step 
624. the bkxk will be classified as VVEAK EDGE. It the 
difierences in the block are very large and d^ at step 
624. the block will be classified as an EDGE at step 670. 
[0021] If d/d. IS not approximately equal to 1. is 
35 compa red to threshold y© at step 630. It should be noted 
there that two diflerent sets of threshokJs are applied lor 
halftones and conlones. Thus, on rnosX occasiohs, 
XoVo» snd X2''y2. The process used to classify 

halftone bkxks is similar to that used for contone data. 
40 Thus. H dg<yo at step 630 the bkxk is classified as UNI- 
FORM at step 640. If yo«^s^^ ®^®P ^® ^\ock is 
classified as SMOOTH, at step 650. If y^<d^<y2as indi- 
cated in step 634. the block is classified as a WEAK 
EDGE at step 660. It d^Xg at step 634. the block vtfill be 
45 Classified as an edge at step 670. 

[0022] Referring now to Figures 8 and 9, the details 
for updating the context ol the block will now be provid- 
ed. The context of a block is useful when the average 
between the dark and bright areas of the block is rela- 
so tively high. V^hen this is the case, the block can classi- 
fied as an EDGE as long as its context is TEXT. The 
context is initially set equal to PICTURE. It is changed 
to TEXT if one of two rules is satisfied: (1) the block be- 
ing processed is in a long run of UNIFORM blocks and 
S£ the average of the dark pixel values m the block is great- 
er than a preset brightness threshold; or (2) the bkxk 
has been classified as erther UNIFORM, WEAK EDGE, 
or EDGE, one of the top. left or righfheighboring bkxks 
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has a context which has been set equal to TEXT, and 
the ditlerence between that neighboring block and the 
current block is smaller than a preset propagation 

threshold. * . . a. *u 

10023] Turning first to Figure 8. determining whether 
block context should be changed according lo the first 
rule requires finding a run of blocks that have been clas- 
silied as UNIFORM as indicated in step 704. Finding a 
run of UNIFORM blocks lypically involves companng 
,he number of consecutive UNIFORM bkxks to a run 
length threshold t^u a® indicated in step 706. The run 
length threshold sets the number of consecutive blocks 
that must be classified as UNIFORM lor a run to be es- 
tablished As also indteated in step 706. Vopixel. av- 
erage value o1 the dark pixels lor consecutive bkx:ks is 
compared to the brightness threshold t^. A large number 
o1 consecutive UNIFORM blocks with high brightness 
levels usually indicates that the blocks contain large 
background page areas (i.e. large white areas), thereby 
indicating that text is present. Thus, if the number of con- 
secutive UNIFORM blocks exceeds t^u Vqpdcel > 
I,, the context for the block is changed to TEXT as indi- 
cated m step 708. 

[0024] If erther the nunnber of identified consecutive 
blocks is too small to establish a run or the bkxks are 
dark (Vdpixel ^ t.). the context will remain set equal to 
PICTURE Whether additional runs are present m the 
block will be determined as indicated in step 710, and if 
so the process will be repeated as indicated in the illus- 

tration. . ■ ^ * ^ 

100251 Turning now to Figure 9. changing the context 
of a bkx:k to TEXT under the second rule first requires 
providing a propagation threshold V- The propagation 
threshold defines the level of brightness that will indicate 
that the block covers blank page areas. Under the sec- 
ond rule, the context will be changed from picture to text 
at step BOB if the block is not SMOOTH (i,e. is UNI- 
FROM and EDGE or a WEAK EDGE) as shown in step 
802 either fts top. left or right neighbor has a text context 
as indicated in step 804 and v^oif. the average differ- 
ence between bright pixels in the btock and bright pixels 
in the neighbor text context block is less than tp as 
shown in step BOB. Neighbor btocks are checked be- 
cause presumably blocks that contain text will be locat- 
ed next to other blocks that contain text. However, the 
brightness value of the block is compared to that of its 
neighbor to assure that this is the case. In other words, 
even if the block has a neighboring block with a text con- 
text a large difference between the average brightness 
of btock and its neighbor means that the block contain 
does not contain the large blank page areas that indicate 
the presence of text. 

100261 Again, the present invention is directed to seg.- 
menting the data by first identifying blocks that contain 
the edges of the image and then separating the blocks 
such that those which contain the smooth data and 
bright sides of the edges are placed on the lower plane 
and the dark sides of the edges are placed on the upper 



plane. Once each of the respective planes is generatisd, 
ordinary MRC processing continues. That is. each plane 
is compressed using an appropriate compression tech- 
nique. In the currently preferred enrtbodiment. upper 

5 plane i2 and tower plane 14 are compressed using 
JPEG while the selector plane 16 is compressed uising 
a symbol based pattern matching technique such as 
CCITT Group IV or a method of classifying scanned 
symbols into equivalence classes such as that de- 

10 scribed in US- A 5.778,095 to Dav'ies issued July 7. 
1998, the contents of whtoh are hereby incorporated by 
reference. The planes are then joined together and 
transmitted tp an output device, such as a facsimile wa- 
chine or storage devtoe. 

IS 

Claims 
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1. A method of segmenting digital image data lor 
mixed raster content processing, comprising: 

a) acquiring a block of the digital iniage data, 
wherein the digital image data is composed ol 
light mtensrty signals in discrete tocatione; 

b) designating a claseificatton for eakJ block 
and providing an indtoatlon about a context ol 
said block; 

c) segmenting sak:S light Intensity signals in said 
block into an upper subset and a lower subset 
based upon said designated classification; 

d) generatng a selector set which tracks said 
light intensity segmentatton; and 

e) separately compressing the digital image da- 
ta contained in said upper end tower subsets. 

2. A method of segmenting digital image data as 
claimed in claim 1, wherein said classiTication indi- 
cates that sakJ btock contains substantially smooth 
data and/or substantially edge data. 

3. A method of segmenting digital image data as 
claimed in claim 1 or claim 2. where'm said dassiTi- 
cation data designating step further comprises: 

a) measuring an amount ot light intensity signal 
variation throughout said block; 

b) assigning a classification to said block based 
upon said measured light intensity signal vari- 
ation; and 

c) updating said context indication for said 
btock. and designating classification for said 
btock based upon said updated context. 

4. A method of segmenting digital in^ge data as 
claimed in any of the preceding claims, further com- 
prising: 

a) dividing a tow pass filtered version of said 
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block into a bright region and a dark region; 

b) calculating average filtered light intensity val- 
ues lor said bright region and for said-dark re- 
gion: and 

c) obtaining a difference in average filtered light 
intensity values between said bright region and 
said dark region. 

A method of segmenting a block of digital image da- 
ta into an upper and lower subset, wherein the block 
of data is composed of light intensity signals in dis- 
crete localions, comprising: 

a) determining whether the block is located on 
an edge in the digital image; 

b) H the block is on an edge, comparing a mag- 
nitude of each light intensity signal in the block 
to a brightness threshold and placing said sig- 
nal in the upper subset if said light intensity 
rnagnitude exceeds said brightness threshold 
or in the tower subset if said light intensfty mag- 
nitude is less than sakJ brightness threshold; 
-and 

c) if the block is not tocated on an edge, placing 
the bk)ck in the upper subset if the block is in a 
group of blocks that have light intensity values 
which are indicative of smooth and dark image 
data, and othenvise placing the blockinthe low- 
er subset. 

A method of classifying a block of digital image data 
into one of a plurality of inr^age data types, wherein 
the block of data is composed of light intensity sig- 
nals in discrete locations, comprising: 

a) dividing the btock into a bright regkxi and a 
dark regton; 

b) dividing a low pass filtered verston of said 
block into a bright region and a dark region; 

c) calculating average light intensity values for 
each of said bright region, sakj dark region, 
said filtered bright region and said filtered dark 
region: and 

d) comparing a difference between said bright 
region and sakJ dark regk^n average light inten- 
sity values to a filtered difference between said 
bright region and said dark region average fil- 
tered lighl intensity values; 

e) if saki average light intensity difference and 
said average filtered light intensity difference 
are approximately equal finding a range of val- 
ues in which sakJ difference value falls, and 
classifying saW block based upon sakJ value 
range; and 

f) if said average light intensity difference and 
said average filtered light intensity difference 
are not approximately equal finding a range of 
values in which said filtered difference value 



falls and classifying said bk)ck based upon said 
filtered value range. 

A method according to any of claims 1 to 4, M/herein 
bkxks are classified by a method according to claim 
5 or claim 6. 
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